Urbana
Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms
Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations of approximate policy iteration (PI), i.e., where policy improvement and policy evaluation are both performed approximately. In applications where the average reward objective is the meaningful performance metric, discounted reward formulations are often used with the discount factor being close to 1, which is equivalent to making the expected horizon very large.
Characterizing GPU Resilience and Impact on AI/HPC Systems
Cui, Shengkun, Patke, Archit, Chen, Ziheng, Ranjan, Aditya, Nguyen, Hung, Cao, Phuong, Jha, Saurabh, Bode, Brett, Bauer, Gregory, Narayanaswami, Chandra, Sow, Daby, Di Martino, Catello, Kalbarczyk, Zbigniew T., Iyer, Ravishankar K.
In this study, we characterize GPU failures in Delta, the current large-scale AI system with over 600 petaflops of peak compute throughput. The system comprises GPU and non-GPU nodes with modern AI accelerators, such as NVIDIA A40, A100, and H100 GPUs. The study uses two and a half years of data on GPU errors. We evaluate the resilience of GPU hardware components to determine the vulnerability of different GPU components to failure and their impact on the GPU and node availability. We measure the key propagation paths in GPU hardware, GPU interconnect (NVLink), and GPU memory. Finally, we evaluate the impact of the observed GPU errors on user jobs. Our key findings are: (i) Contrary to common beliefs, GPU memory is over 30x more reliable than GPU hardware in terms of MTBE (mean time between errors). (ii) The newly introduced GSP (GPU System Processor) is the most vulnerable GPU hardware component. (iii) NVLink errors did not always lead to user job failure, and we attribute it to the underlying error detection and retry mechanisms employed. (iv) We show multiple examples of hardware errors originating from one of the key GPU hardware components, leading to application failure. (v) We project the impact of GPU node availability on larger scales with emulation and find that significant overprovisioning between 5-20% would be necessary to handle GPU failures. If GPU availability were improved to 99.9%, the overprovisioning would be reduced by 4x.
Reparametrization of 3D CSC Dubins Paths Enabling 2D Search
Xu, Ling, Baryshnikov, Yuliy, Sung, Cynthia
This paper addresses the Dubins path planning problem for vehicles in 3D space. In particular, we consider the problem of computing CSC paths -- paths that consist of a circular arc (C) followed by a straight segment (S) followed by a circular arc (C). These paths are useful for vehicles such as fixed-wing aircraft and underwater submersibles that are subject to lower bounds on turn radius. We present a new parameterization that reduces the 3D CSC planning problem to a search over 2 variables, thus lowering search complexity, while also providing gradients that assist that search. We use these equations with a numerical solver to explore numbers and types of solutions computed for a variety of planar and 3D scenarios. Our method successfully computes CSC paths for the large majority of test cases, indicating that it could be useful for future generation of robust, efficient curvature-constrained trajectories.
Moss: Proxy Model-based Full-Weight Aggregation in Federated Learning with Heterogeneous Models
Cai, Yifeng, Zhang, Ziqi, Li, Ding, Guo, Yao, Chen, Xiangqun
Modern Federated Learning (FL) has become increasingly essential for handling highly heterogeneous mobile devices. Current approaches adopt a partial model aggregation paradigm that leads to sub-optimal model accuracy and higher training overhead. In this paper, we challenge the prevailing notion of partial-model aggregation and propose a novel "full-weight aggregation" method named Moss, which aggregates all weights within heterogeneous models to preserve comprehensive knowledge. Evaluation across various applications demonstrates that Moss significantly accelerates training, reduces on-device training time and energy consumption, enhances accuracy, and minimizes network bandwidth utilization when compared to state-of-the-art baselines.
Towards Efficient Large Scale Spatial-Temporal Time Series Forecasting via Improved Inverted Transformers
Sun, Jiarui, Yeh, Chin-Chia Michael, Fan, Yujie, Dai, Xin, Fan, Xiran, Jiang, Zhimeng, Saini, Uday Singh, Lai, Vivian, Wang, Junpeng, Chen, Huiyuan, Zhuang, Zhongfang, Zheng, Yan, Chowdhary, Girish
Time series forecasting at scale presents significant challenges for modern prediction systems, particularly when dealing with large sets of synchronized series, such as in a global payment network. In such systems, three key challenges must be overcome for accurate and scalable predictions: 1) emergence of new entities, 2) disappearance of existing entities, and 3) the large number of entities present in the data. The recently proposed Inverted Transformer (iTransformer) architecture has shown promising results by effectively handling variable entities. However, its practical application in large-scale settings is limited by quadratic time and space complexity ($O(N^2)$) with respect to the number of entities $N$. In this paper, we introduce EiFormer, an improved inverted transformer architecture that maintains the adaptive capabilities of iTransformer while reducing computational complexity to linear scale ($O(N)$). Our key innovation lies in restructuring the attention mechanism to eliminate redundant computations without sacrificing model expressiveness. Additionally, we incorporate a random projection mechanism that not only enhances efficiency but also improves prediction accuracy through better feature representation. Extensive experiments on the public LargeST benchmark dataset and a proprietary large-scale time series dataset demonstrate that EiFormer significantly outperforms existing methods in both computational efficiency and forecasting accuracy. Our approach enables practical deployment of transformer-based forecasting in industrial applications where handling time series at scale is essential.
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own
Puthumanaillam, Gokul, Ornik, Melkior
This paper presents a comprehensive investigation into the capability of Large Language Models (LLMs) to successfully complete a semester-long undergraduate control systems course. Through evaluation of 115 course deliverables, we assess LLM performance using ChatGPT under a "minimal effort" protocol that simulates realistic student usage patterns. The investigation employs a rigorous testing methodology across multiple assessment formats, from auto-graded multiple choice questions to complex Python programming tasks and long-form analytical writing. Our analysis provides quantitative insights into AI's strengths and limitations in handling mathematical formulations, coding challenges, and theoretical concepts in control systems engineering. The LLM achieved a B-grade performance (82.24\%), approaching but not exceeding the class average (84.99\%), with strongest results in structured assignments and greatest limitations in open-ended projects. The findings inform discussions about course design adaptation in response to AI advancement, moving beyond simple prohibition towards thoughtful integration of these tools in engineering education. Additional materials including syllabus, examination papers, design projects, and example responses can be found at the project website: https://gradegpt.github.io.
APECS: Adaptive Personalized Control System Architecture
Juston, Marius F. R., Gisi, Alex, Norris, William R., Nottage, Dustin, Soylemezoglu, Ahmet
This paper presents the Adaptive Personalized Control System (APECS) architecture, a novel framework for human-in-the-loop control. An architecture is developed which defines appropriate constraints for the system objectives. A method for enacting Lipschitz and sector bounds on the resulting controller is derived to ensure desirable control properties. An analysis of worst-case loss functions and the optimal loss function weighting is made to implement an effective training scheme. Finally, simulations are carried out to demonstrate the effectiveness of the proposed architecture. This architecture resulted in a 4.5% performance increase compared to the human operator and 9% to an unconstrained feedforward neural network trained in the same way.
Uncertainty Quantification From Scaling Laws in Deep Neural Networks
Elsharkawy, Ibrahim, Kahn, Yonatan, Hooberman, Benjamin
Deep learning techniques have improved performance beyond conventional methods in a wide variety of tasks. However, for neural networks in particular, it is not straightforward to assign network-induced uncertainty on their output as a function of network architecture, training algorithm, and initialization [1]. One approach to uncertainty quantification (UQ) is to treat any individual network as a draw from an ensemble, and identify the systematic uncertainty with the variance in the neural network outputs over the ensemble [2, 3]. This variance can certainly be measured empirically by training a large ensemble of networks, but it would be advantageous to be able to predict it from first principles. This is possible in the infinite-width limit of multi-layer perceptron (MLP) architectures, where the statistics of the network outputs after training are Gaussian with mean and variance determined by the neural tangent kernel (NTK) [4-6]. For realistic MLPs with large but finite width n, one can compute corrections to this Gaussian distribution that are perturbative in 1/n [7].
TRACE: A Self-Improving Framework for Robot Behavior Forecasting with Vision-Language Models
Puthumanaillam, Gokul, Padrao, Paulo, Fuentes, Jose, Thangeda, Pranay, Schafer, William E., Song, Jae Hyuk, Jagdale, Karan, Bobadilla, Leonardo, Ornik, Melkior
Predicting the near-term behavior of a reactive agent is crucial in many robotic scenarios, yet remains challenging when observations of that agent are sparse or intermittent. Vision-Language Models (VLMs) offer a promising avenue by integrating textual domain knowledge with visual cues, but their one-shot predictions often miss important edge cases and unusual maneuvers. Our key insight is that iterative, counterfactual exploration--where a dedicated module probes each proposed behavior hypothesis, explicitly represented as a plausible trajectory, for overlooked possibilities--can significantly enhance VLM-based behavioral forecasting. We present TRACE (Tree-of-thought Reasoning And Counterfactual Exploration), an inference framework that couples tree-of-thought generation with domain-aware feedback to refine behavior hypotheses over multiple rounds. Concretely, a VLM first proposes candidate trajectories for the agent; a counterfactual critic then suggests edge-case variations consistent with partial observations, prompting the VLM to expand or adjust its hypotheses in the next iteration. This creates a self-improving cycle where the VLM progressively internalizes edge cases from previous rounds, systematically uncovering not only typical behaviors but also rare or borderline maneuvers, ultimately yielding more robust trajectory predictions from minimal sensor data. We validate TRACE on both ground-vehicle simulations and real-world marine autonomous surface vehicles. Experimental results show that our method consistently outperforms standard VLM-driven and purely model-based baselines, capturing a broader range of feasible agent behaviors despite sparse sensing. Evaluation videos and code are available at trace-robotics.github.io.